Search Results

Documents authored by Sivukhin, Nikita


Document
Compressed Multiple Pattern Matching

Authors: Dmitry Kosolobov and Nikita Sivukhin

Published in: LIPIcs, Volume 128, 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)


Abstract
Given d strings over the alphabet {0,1,...,sigma{-}1}, the classical Aho - Corasick data structure allows us to find all occ occurrences of the strings in any text T in O(|T| + occ) time using O(m log m) bits of space, where m is the number of edges in the trie containing the strings. Fix any constant epsilon in (0, 2). We describe a compressed solution for the problem that, provided sigma <=m^delta for a constant delta < 1, works in O(|T| 1/epsilon log(1/epsilon) + occ) time, which is O(|T| + occ) since epsilon is constant, and occupies mH_k + 1.443 m + epsilon m + O(d log m/d) bits of space, for all 0 <= k <= max{0,alpha log_sigma m - 2} simultaneously, where alpha in (0,1) is an arbitrary constant and H_k is the kth-order empirical entropy of the trie. Hence, we reduce the 3.443m term in the space bounds of previously best succinct solutions to (1.443 + epsilon)m, thus solving an open problem posed by Belazzougui. Further, we notice that L = log binom{sigma (m+1)}{m} - O(log(sigma m)) is a worst-case space lower bound for any solution of the problem and, for d = o(m) and constant epsilon, our approach allows to achieve L + epsilon m bits of space, which gives an evidence that, for d = o(m), the space of our data structure is theoretically optimal up to the epsilon m additive term and it is hardly possible to eliminate the term 1.443m. In addition, we refine the space analysis of previous works by proposing a more appropriate definition for H_k. We also simplify the construction for practice adapting the fixed block compression boosting technique, then implement our data structure, and conduct a number of experiments showing that it is comparable to the state of the art in terms of time and is superior in space.

Cite as

Dmitry Kosolobov and Nikita Sivukhin. Compressed Multiple Pattern Matching. In 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Leibniz International Proceedings in Informatics (LIPIcs), Volume 128, pp. 13:1-13:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)


Copy BibTex To Clipboard

@InProceedings{kosolobov_et_al:LIPIcs.CPM.2019.13,
  author =	{Kosolobov, Dmitry and Sivukhin, Nikita},
  title =	{{Compressed Multiple Pattern Matching}},
  booktitle =	{30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)},
  pages =	{13:1--13:14},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-103-0},
  ISSN =	{1868-8969},
  year =	{2019},
  volume =	{128},
  editor =	{Pisanti, Nadia and P. Pissis, Solon},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/LIPIcs.CPM.2019.13},
  URN =		{urn:nbn:de:0030-drops-104847},
  doi =		{10.4230/LIPIcs.CPM.2019.13},
  annote =	{Keywords: multiple pattern matching, compressed space, Aho--Corasick automaton}
}
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail